Compound decomposition in Dutch large
نویسندگان
چکیده
This paper addresses compound splitting for Dutch in the context of broadcast news transcription. Language models were created using original text versions and text versions that were decomposed using a data-driven compound splitting algorithm. Language model performances were compared in terms of outof-vocabulary rates and word error rates in a real-world broadcast news transcription task. It was concluded that compound splitting does improve ASR performance. Best results were obtained when frequent compounds were not decomposed.
منابع مشابه
Compound decomposition in dutch large vocabulary speech recognition
This paper addresses compound splitting for Dutch in the context of broadcast news transcription. Language models were created using original text versions and text versions that were decomposed using a data-driven compound splitting algorithm. Language model performances were compared in terms of outof-vocabulary rates and word error rates in a real-world broadcast news transcription task. It ...
متن کاملPHOTOCHEMICAL DECOMPOSITION OF 3-CHLORO-CROTON ALDEHYDE TOSYLHYDRAZONE SODIUM SALT
The low temperature photochemical decomposition of 3- chloro-croton aldehyde tosylhydrazone sodium salt in tetrahydrofuran results in the formation of 3- chloro- 3- methyl cyclopropene and 5- chloro- 5- methyl- pyrazolenine respectively. In the presence of moisture, the photohydrated compound 3 was obtained as one of the products
متن کاملAccelerated decomposition techniques for large discounted Markov decision processes
Many hierarchical techniques to solve large Markov decision processes (MDPs) are based on the partition of the state space into strongly connected components (SCCs) that can be classified into some levels. In each level, smaller problems named restricted MDPs are solved, and then these partial solutions are combined to obtain the global solution. In this paper, we first propose a novel algorith...
متن کاملUnsupervised Compound Splitting With Distributional Semantics Rivals Supervised Methods
In this paper we present a word decompounding method that is based on distributional semantics. Our method does not require any linguistic knowledge and is initialized using a large monolingual corpus. The core idea of our approach is that parts of compounds (like “candle” and “stick”) are semantically similar to the entire compound, which helps to exclude spurious splits (like “candles” and “t...
متن کاملCreating a Dutch testbed to evaluate the retrieval from textual databases
This paper describes the first large-scale evaluation of information retrieval systems using Dutch documents and queries. We describe in detail the characteristics of the Dutch test data, which is part of the official CLEF multilingual texttual database, and give an overview of the experimental results of companies and research institutions that participated in the first official Dutch CLEF exp...
متن کامل